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Abstract:—Least-squares coefficients for multiple-regression models may 
be unstable when the independent variables are highly correlated. Ridge 
regression is a biased estimation procedure that produces stable estimates 
of the coefficients. Ridge regression is discussed, and a computer program 
for calculating the ridge coefficients is presented. 
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Multiple-regression models are widely used in 
forestry. In some studies, the independent 
variables are highly correlated. In this case the 
least-squares coefficients may be too large in 
absolute value, and the signs may reverse with 
small changes in the data. With highly cor- 
related data, one should consider estimation 
methods that reduce the effects of the correla- 
tion and produce stable regression coefficients 
(Marquardt and Snee 1975). 

The purpose of this note is to discuss ridge- 
regression methods and present a computer 
program for ridge regression. A list of 
references is also given. 


Ridge Regression 


The observational equations for a multiple- 
regression model can be written as 
Y=XS+e 
~~ ~omN ~ 
in which Y is the nxl vector of observations, X is 
ot . . . . 
the nxp matrix of independent variables, @ is 
a pxl vector of parameters unknown, and € is 
the nxl vector of errors. It is assumed that 
E(¢ )=0 and E( e€’e )= 6 71. 
rate ~~ wow RB ~w é 
rhe least-squares estimate of @ is 
~~ 


B=(X'X)-'X’Y. [1] 


For convenience, we assume that X’X and X’ nf 
are in the correlation form. Methods of sealing 
x’ X and XY to the correlation form are dis- 
cussed by Draper and Smith (1966, p. 147). It is 
well known that 8 is the best linear unbiased 
estimate of 8. However, when the predictor 
variables are, highly correlated, the average 
See of B to 8 is large. In. particular, 


E[(6-8)' (8-8)] ds large. 


Hoerl-and Pe (1970 suggested that the. 


estimator: 
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be used when the independent, variables are 
highly correlated. The estimate * is called the 
ridge estimator. If £. Bi is peerece ae ee a 
value, of _ k>0 Such that 8)'(B* -B)| 
<E[(8-8)' 3-8) )|. The ridge eon ok has the 
property that, as k increases, the variance of 
B* decreases, but the bias increases. The best 
Tegression estimates of B are those that are 
stable and have a small mean-square error. 
To calculate the ridge estimator 6* from 
equation [2], one would have to invert the pxp 
matrix, (X’'X+kI), for each value of k. This 
sequence OF matrix inversions could be time- 
consuming even with a high-speed computer. 
The ridge estimator can be expressed in a form 
that may be better for computing purposes. 
We know from matrix theory that, because 
xX is symmetric, there exists an orthogonal 
matrix A and a diagonal matrix D such that 
A'X'XA= D and A’A=I. The matrix A is the 
matrix of e eigenvectors of X’X, and the matrix D 
is the diagonal matrix of eigenvalues of X'X. 
Adding KI to both sides of A'X'XA=D gives 


[3] 


Multiplying the second term on the left-hand 
side of equation [3] by A’A, gives 


A’X'XA+kI= D+kL 
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A'X'XA+kA'TA= D+kl, 
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[4] 
which can be written as 


D+kl. 
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Premultiplying both sides of equation [5] by 
(A’)-' and postmultiplying by A-! gives 


A(X'X+kKDA= 


A XK =(A))- (Dak Ast, 
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[6] 
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Taking the inverse of both sides yields 


(X’ Gade = A(D+kI)- DS [7] 

Substituting the results of equation [7] in 
equation [2], we find that the ridge estimator 
can be written 


B*=A(D+KI-1A [3] 
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“This form of the ridge estimator may be ef- 
ficient for computing in problems with a large 
number of independent variables. The matrix 
(D+k]) is diagonal, and the elements of the in- 
verse are the reciprocals of the diagonal 
elements. The matrix of eigenvectors A and the 
matrix of eigenvalues D need to be calculated 
only once. However, the algorithm for computing 
the eigenvalues is iterative, and the solution 
may occasionally take more time than 
calculating the inverses of (X’ X+kD. 

The estimates of the ridge Coefficients at k=0 
are the least squares estimates. If the least 
squares regression is significant, then different 
values of k should be explored. 

The ridge trace, which is a plot of the ridge 
coefficients for different values of k, is an im- 
portant part of ridge regression. The sums of 
squares of residuals should also be plotted. The 
ridge trace is examined for trends of the ridge 
coefficients as k is changed. The best estimates 
of the ridge coefficients are those where the 
trace shows that the coefficients have stabilized 
and the sums of squares of residuals is still 
small (Marquardt and Snee 1975). 

Hoer] and Kennard (19706) discuss the use of 
the ridge trace to eliminate variables with the 
least predicting power. Thus, ridge regression 
can be used as a guide for selecting the best sub- 
set of variables; that is, ridge regression is an 
alternative for stepwise regression. 


Program Ridge 


Program RIDGE is written in ASA Fortran 
IV for the IBM 370/168 computer. Information 
needed for the control cards is listed in the 
appendix. A variable format statement is used 
to input the data. The dependent variable is 
positioned by the program, hence special 
arrangement of the data is not necessary. A 
maximum of 19 independent variables is al- 
lowed for program RIDGE. This capacity may 


be increased by changing the dimensicn 
statements. Nineteen values of k from 0 to 1.0 
are automatically supplied by the program. 
Other values of k may be designated by the 
user. 

The means and variances of the variables are 
printed by program RIDGE. The Xx’ x and X"Y 
matrices are transformed into the correlation 
form and printed. 

The eigenvalues and corresponding matrix of 
eigenvectors for the X'X matrix are calculated. 
The presence of one or more zero eigenvalues in- 
dicates linear dependencies between the in- 
dependent variables. If this condition exists, 
X’'X is singular for k=0, and the program ter- 
minates with an error message. If no linear 
dependencies are present, an analysis of 
variance table is printed. 

Standardized and actual regression coef- 
ficients are printed for the different values of k. 
The ridge trace can be plotted by the user from 
the standardized coefficients. However, we 
found that in most cases the tabled values of 
standardized coefficients provide sufficient in- 
formation for selecting the appropriate ridge 
solution. 

The computer program is available from the 
Biometrics Group, Northeastern Forest Experi- 
ment Station. 


An Example of Ridge Regression 


Suppose we have 10 sample observations for 3 
independent variables and 1 dependent variable 
(table 1). Computer output from program 
RIDGE for this example is given in the appen- 
dix. Investigation of the correlation matrix 
reveals high correlations between the predictor 
variables; and one of the eigenvalues, 0.0138, is 
small. These conditions suggest that ridge 
regression be used to estimate the regression 
coefficients. Since the F ratio for the least- 
squares solution is highly significant, ridge- 
regression coefficients and the residual sums of 
squares were calculated for 19 values of k. 

The ridge trace was constructed by plotting 


Table 1.—Data for sample problem 


Y Xi X2 X3 
223 11 11 11 
223 14 5) 11 
292 LT, 18 20 
27 17 17 18 
285 18 19 18 
304 18 18 19 
311 19 18 20 
314 20 21 il 
328 23 24 25 
340 745) Pas 24 


Figure 1.—Ridge trace for 19 values of k. 
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the standardized regression coefficients against 
values of k (fig. 1). The trace suggests that the 
least-squares coefficients are too large in ab- 
solute value, 82 even having the wrong sign. At 
k=0.2 the coefficients have stabilized, and the 
residual sums of squares (SSE) has not substan- 
tially increased. 
The ridge regression 


Y*=132.5+2.870(X1)+1.650(X2)+3.934(X3) 


should be a better predicting equation than the 
least-squares equation even though the coef- 
ficients are biased. 


Summary 


Ridge regression is a statistical technique that 
foresters should find useful. It is used to es- 
timate coefficients for multiple-regression 
models when the independent variables are 
highly correlated. 

Considerable research has been done on ridge 
regression. The paper by Hoerl and Kennard 
(1970a) introduced ridge-regression theory. 
Although there is considerable matrix algebra 
in this paper, it provides a sound background for 
the understanding and application of ridge 
regression. The subsequent paper by Hoerl and 
Kennard (1970b) illustrates the applications of 
ridge regression, including its use as a guide to 
variable selection. The article by Marquardt and 
Snee (1975) is perhaps the most readable paper 


on ridge regression. All aspects of ridge regres- 
sion are discussed at length, and many examples 
are included. Some of the other articles listed 
are more mathematically sophisticated. 
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APPENDIX 


PROGRAM RIDGE REGRESSION 
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PROGRAM CONTROL INFORMATION: 
PROGRAM CONTROL CARDS MUST BE THE FIRST CARDS IN THE DATA DECK. 
CARC 1 (REQUIRED): PROBLEM TITLE»UP TO 80 CHARACTERS LONG. 
A BLANK CARD MAY BE SUBMITTED IF NO 
PROBLEM TITLE IS DESIRED. 


CARD 2 (REQUIRED): SPECIFY NeMyPYsDyAND Ve FORMAT IS 515,y 
RIGHT JUSTIFIED. 
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Cc NUMBER JF OBSERVATIONS. 

(e NUMBER OF VARIABLES,INCLUDING Y. 

C MAXIMUM OF 19 INDEPENDENT VARIABLES. 
Cc Py = POSITION OF THE DEPENDENT VARIABLE 
C IN THE DATA. THE PROGRAM WILL MAKE 
(c THE DEPENDENT VARIABLE THE LAST 

C VARIABLE. 

C NUMBER JF INCREMENTS (K*S) FOR THE 
C X-PRIME-X MATRIX, IF INCREMENTS ARE 
C TO BE USER-SUPPLIED. MAXIMUM OF 18 
C K*S MAY Be SPECIFIED. INCREMENTS 

Cc WILL BE PRIGRAM-SUPPLIED IF LEFT 
C 

Cc 

C 

C 

Cc 

C 

Cc 

C 

C 

C 

(c 

(c 

C 

Cc 

C 

C 

Cc 

C 

Cc 

C 

Cc 

C 

(E 

C 

C 


oOo 
W 


BLANK. K = 0.0 IS ALWAYS SUPPLIED 
BY THE PROGRAM, ANO SHOULD NOT BE 
SPECIFIED BY THE USER. CARD 5 
REQUIRED IF D IS NOT BLANK. 

1 IF VARIABLE NAMES ARE TO BE 
SPECIFIED BY THE USER. LEAVE BLANK 
OTHERWISE. CARD & REQUIRED IF V IS 
NOT BLANK. 
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CARC 3 (REQUIRED): VARIABLE FORMAT FOR DATA, ENCLOSED IN 
PARENTHESES. 


CARD 4 (OPTIONAL): VARIABLE NAMES. FORMAT IS MA8, LEFT 
JUSTIFIED. BLANKS “UST BE LEFT FOR THOSE 
VARIABLES WITH NO NAMES IF THIS OPTION 
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IS IN EFFECT. MORE THAN ONE CARD IF 
NECESSARY. 
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CARD 5 (OPTIONAL): VALUES OF INCREMENTS,K,TO BE ADDED TI c 
THE X-PRIME-X MATRIX. FORMAT IS OF5.0. Cc 

VALUES SHOULD HAVE DECIMAL POINTS. C 

MORE THAN GONE CARO IF NECESSARY. Cc 

Cc 

C 

Cc 
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VARIABLE 
Noe NAME MEAN VARTANCE 
L Xl Ool1B20VE+0Z 0-161 78F+02 
2 X2 0.-186V0E+02 Vel6T1LLEF02 
3 x3 0e-18700cC+02 Oe21789£+02 
4 Y 0e.2890UE+U3 0.16171F+04 


CCKKELATIUN MATRIX R 


VARITAbLE 
NUe NAME 
1 Xl 1.0VUV00 
2 X2 0.9853 1.0000 
3 x3 0.9386 0.9247 1.0000 
4 Y V29384 029064 09725 1.9000 


EIGENVALUES OF X-PRIME-X 


VAR IAbDLE 

Ne NAME 
4 X1 228953 
2 X2 U.0869 
3 X3 0.0138 


MATKIX GF EIGENVECTORS OF X-PRIME-X 


025824 -0.3219 -0.7465 
0.5796 -0.4795 0.6589 
0.5790 0.8164 02.0927 


ANALYSIS OF VARIANCE TABLe FOR 
LEAST SQUARES SOLUTION (K=0.0) 


SUMS OF 
SCUKCEc DF SQUARES MSE F 
TCTAL 9 0.14554E+05 
ReoRcSSiON 3 026140 I8F+05 J046094F +04 0251332E+02 
RESIDUAL 6 06 54579F+03 02.90965E+02 
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